AITopics | hyperparameter space

Collaborating Authors

hyperparameter space

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs

Neural Information Processing SystemsJun-11-2026, 16:06:10 GMT

The selection of hyperparameters, such as prompt templates in large language models (LLMs), must often strike a balance between reliability and cost. In many cases, structural relationships between the expected reliability levels of the hyperparameters can be inferred from prior information and held-out data -- e.g., longer prompt templates may be more detailed and thus more reliable. However, existing hyperparameter selection methods either do not provide formal reliability guarantees or are unable to incorporate structured knowledge in the hyperparameter space. This paper introduces reliability graph-based Pareto testing (RG-PT), a novel multi-objective hyperparameter selection framework that maintains formal reliability guarantees in terms of false discovery rate (FDR), while accounting for known relationships among hyperparameters via a directed acyclic graph. Edges in the graph reflect expected reliability and cost trade-offs among hyperparameters, which are inferred via the Bradley-Terry (BT) ranking model from prior information and held-out data. Experimental evaluations demonstrate that RG-PT significantly outperforms existing methods such as learn-then-test (LTT) and Pareto testing (PT) through a more efficient exploration of the hyperparameter space.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback

Probabilistic Matrix Factorization for Automated Machine Learning

Nicolo Fusi, Rishit Sheth, Melih Elibol

Neural Information Processing SystemsFeb-14-2026, 06:42:59 GMT

Moreover, given the ever increasing number of machine learning models being developed, model selection is becoming increasingly important.

artificial intelligence, dataset, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England (0.05)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

9d86d83f925f2149e9edb0ac3b49229c-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 06:23:59 GMT

dataset, ft -transformer, hyperparameter space, (12 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

12ced2db6f0193dda91ba86224ea1cd8-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 13:36:19 GMT

hyperparameter, hyperparameter space, pdp, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
Europe > Germany > Lower Saxony > Hanover (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Learning to Mutate with Hypergradient Guided Population

Neural Information Processing SystemsDec-24-2025, 15:22:23 GMT

Computing the gradient of model hyperparameters, i.e., hypergradient, enables a promising and natural way to solve the hyperparameter optimization task. However, gradient-based methods could lead to suboptimal solutions due to the non-convex nature of optimization in a complex hyperparameter space. In this study, we propose a hyperparameter mutation (HPM) algorithm to explicitly consider a learnable trade-off between using global and local search, where we adopt a population of student models to simultaneously explore the hyperparameter space guided by hypergradient and leverage a teacher model to mutate the underperforming students by exploiting the top ones. The teacher model is implemented with an attention mechanism and is used to learn a mutation schedule for different hyperparameters on the fly. Empirical evidence on synthetic functions is provided to show that HPM outperforms hypergradient significantly. Experiments on two benchmark datasets are also conducted to validate the effectiveness of the proposed HPM algorithm for training deep neural networks compared with several strong baselines.

hypergradient guided population, learning, name change, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Efficient Hyperparameter Tuning via Trajectory Invariance Principle

Li, Bingrui, Wen, Jiaxin, Zhou, Zhanpeng, Zhu, Jun, Chen, Jianfei

arXiv.org Artificial IntelligenceSep-30-2025

As hyperparameter tuning becomes increasingly costly at scale, efficient tuning methods are essential. Yet principles for guiding hyperparameter tuning remain limited. In this work, we seek to establish such principles by considering a broad range of hyperparameters, including batch size, learning rate, and weight decay. We identify a phenomenon we call trajectory invariance, where pre-training loss curves, gradient noise, and gradient norm exhibit invariance--closely overlapping--with respect to a quantity that combines learning rate and weight decay. This phenomenon effectively reduces the original two-dimensional hyperparameter space to one dimension, yielding an efficient tuning rule: follow the salient direction revealed by trajectory invariance. Furthermore, we refine previous scaling laws and challenge several existing viewpoints. Overall, our work proposes new principles for efficient tuning and inspires future research on scaling laws.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.25049

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)

Add feedback

Supplementary material A with for numerical features

Neural Information Processing SystemsAug-17-2025, 07:07:47 GMT

We provide visual explanation of how embeddings are passed to MLP in Figure 2 and Figure 3. Also, We provide visualisation of target-aware PLE (subsubsection 3.2.2) in Figure 4. Figure 4: Obtaining bins for PLE from decision trees. We used the following datasets: Gesture Phase Prediction (Madeo et al. [27]) Churn Modeling We follow the pointwise approach to learning-to-rank and treat this ranking problem as a regression problem. In this section, we apply the quantile-based piecewise linear encoding (described in subsubsec-tion 3.2.1 to MLP and Transformer on the synthetic GBDT -friendly dataset described in section 5.1 The results are visualized in Figure 5. In this section, we test Fourier features implemented exactly as in Tancik et al. We mostly follow Gorishniy et al. [13] in terms of the tuning, training and evaluation protocols.

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning to Mutate with Hypergradient Guided Population

Neural Information Processing SystemsAug-16-2025, 12:25:12 GMT

Computing the gradient of model hyperparameters, i.e., hypergradient, enables a

hyperparameter, mutation, optimization, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)

Add feedback

Supplementary material

Neural Information Processing SystemsAug-16-2025, 10:02:05 GMT

All the experiments were conducted under the same conditions in terms of software versions. The feature preprocessing for DL models is described in the main text. The preprocessing is then applied to original features. The remaining notation follows those from the main text. For most experiments, training times can be found in the source code.

artificial intelligence, hyperparameter space, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback